TOPIC GROUPING BASED ON DESCRIPTION TEXT IN MICROSOFT RESEARCH VIDEO DESCRIPTION CORPUS DATA USING FASTTEXT, PCA AND K-MEANS CLUSTERING

نویسندگان

چکیده

This research groups topics of the Microsoft Research Video Description Corpus (MRVDC) based on text descriptions Indonesian language dataset. The is a video dataset developed by Research, which contains paraphrased event expressions in English and other languages. results grouping these show how patterns similarity interrelationships between from different data, will be useful for topic-based retrieval. topic process using fastText as word embedding, PCA features reduction method K-means clustering method. experiment 1959 videos with 43753 to vary number k with/without result that optimal 180 silhouette coefficient 0.123115.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast Support Vector Data Description Using K-Means Clustering

Support Vector Data Description (SVDD) has a limitation for dealing with a large data set in which computational load drastically increases as training data size becomes large. To handle this problem, we propose a new fast SVDDmethod using K-means clustering method. Our method uses divide-and-conquer strategy; trains each decomposed subproblems to get support vectors and retrains with the suppo...

متن کامل

Persistent K-Means: Stable Data Clustering Algorithm Based on K-Means Algorithm

Identifying clusters or clustering is an important aspect of data analysis. It is the task of grouping a set of objects in such a way those objects in the same group/cluster are more similar in some sense or another. It is a main task of exploratory data mining, and a common technique for statistical data analysis This paper proposed an improved version of K-Means algorithm, namely Persistent K...

متن کامل

persistent k-means: stable data clustering algorithm based on k-means algorithm

identifying clusters or clustering is an important aspect of data analysis. it is the task of grouping a set of objects in such a way those objects in the same group/cluster are more similar in some sense or another. it is a main task of exploratory data mining, and a common technique for statistical data analysis this paper proposed an improved version of k-means algorithm, namely persistent k...

متن کامل

Distributed PCA and k-Means Clustering

This paper proposes a distributed PCA algorithm, with the theoretical guarantee that any good approximation solution on the projected data for k-means clustering is also a good approximation on the original data, while the projected dimension required is independent of the original dimension. When combined with the distributed coreset-based clustering approach in [3], this leads to an algorithm...

متن کامل

Extraction based approach for text summarization using k-means clustering

This paper describes an algorithm that incorporates kmeans clustering, term-frequency inverse-document-frequency and tokenization to perform extraction based text summarization.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: JIP (Jurnal Informatika Polinema)

سال: 2023

ISSN: ['2614-6371', '2407-070X']

DOI: https://doi.org/10.33795/jip.v9i2.1271